首页> 外文OA文献 >Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network
【2h】

Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network

机译:自然场景图像和视频帧中的脚本识别   基于注意力的卷积LsTm网络

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Script identification facilitates many important applications indocument/video analysis. This paper focuses on the problem of scriptidentification in scene text images and video scripts. Because of low imagequality, complex background and similar layout of characters shared by somescripts like Greek, Latin, etc., text recognition in such scenario isdifficult. Most of the recent approaches usually apply a patch-based CNNnetwork with summation of obtained features, or only a CNN-LSTM network to getthe identification result. Some use a discriminative CNN to jointly optimizemid-level representations and deep features. In this paper, we propose a novelmethod that involves extraction of local and global features using CNN-LSTMframework and weighting them dynamically for script identification. First weconvert the images into patches and feed them into a CNN-LSTM framework.Attention-based patch weights are calculated applying softmax layer after LSTM.Then we do patch-wise multiplication of these weights with corresponding CNN toyield local features. Global features are also extracted from last cell stateof LSTM. We employ a fusion technique which dynamically weights the local andglobal features for an individual patch. Experiments have been done in twopublic script identification datasets, SIW-13 and CVSI2015. Our learningprocedure achieves superior performance compared with previous approaches.
机译:脚本识别有助于文档/视频分析中的许多重要应用。本文重点研究场景文本图像和视频脚本中的脚本识别问题。由于图像质量低,背景复杂以及希腊文,拉丁文等某些脚本共享的相似字符布局,因此在这种情况下很难进行文本识别。最近的大多数方法通常将基于补丁的CNN网络与所获得特征的总和结合,或者仅应用CNN-LSTM网络来获得识别结果。有些人使用区分式CNN来共同优化中级表示和深层功能。在本文中,我们提出了一种新颖的方法,该方法涉及使用CNN-LSTMframework提取局部和全局特征并对它们进行动态加权以进行脚本识别。首先我们将图像转换为补丁并将其输入CNN-LSTM框架中,然后在LSTM之后应用softmax层计算基于注意力的补丁权重,然后将这些权重与相应的CNN局部特征进行逐块乘法。全局特征也从LSTM的最后一个单元状态中提取。我们采用一种融合技术,可以动态加权单个补丁的局部和全局特征。已经在两个公共脚本识别数据集SIW-13和CVSI2015中进行了实验。与以前的方法相比,我们的学习过程可实现卓越的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号